Main focus of the EDA is to find datasets which can be used to train a machine learning model. But it is imporant to create a better understanding of the environment and sets you are dealing with. In the upcoming section various datasets will be explored and valuable features will be extracted. These will all be comprised together to create the new dataset.
#import libraries
import numpy as np #used for mathematical functions
import pandas as pd #used for data
import matplotlib #plotting library
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn as sk
import statsmodels.api as sm
print('numpy version:', np.__version__)
print('pandas version:', pd.__version__)
print('scikit-learn version:', sk.__version__)
print('matplotlib version:', matplotlib.__version__)
numpy version: 1.20.1 pandas version: 1.2.4 scikit-learn version: 0.24.1 matplotlib version: 3.3.4
df_chaos04 = pd.read_csv('/Users/Matt/Desktop/AI/CHALLENGE1/ongevallen-2004.csv',
low_memory=0
#header=0,
#index_col=0)
).iloc[:]
pd.set_option('display.max.columns', None)
df_chaos04.head()
| Longitude | Latitude | OngevalID | Communicatie_Ref | ProcesverbaalOpgem | Afloop3 | AantalPartijen | Aard | GekoppledNiveau | Wegsituatie | WegsituatieAnders | BebouwdeKom | AangegevenMaxSnelheid | Wegverlichting | Wegverharding | Wegverharding_Anders | Wegdekken | Wegdekken_Anders | Lichtgesteldheid | Zichtafstand | Weergesteldheid1 | Weergesteldheid2 | BijzonderheidTypeVerkeersmaatregel1 | BijzonderheidTypeVerkeersmaatregel2 | BijzonderheidTypeVerkeersmaatregel3 | BijzonderheidTypeVerkeersmaatregel_An | BijzonderheidTypeInfrastructuur1 | BijzonderheidTypeInfrastructuur2 | BijzonderheidTypeInfrastructuur3 | BijzonderheidTypeInfrastructuur_An | BijzonderheidTidelijkAard1 | BijzonderheidTidelijkAard2 | BijzonderheidTidelijkAard3 | BijzonderheidTidelijkAard_An | JunctieID | WegvakID | Hectometer | Huisnummer | GemeenteID | GemeenteNaam | Provincie | KaderwetgebiedNaam | PolitieDistrictNaam | BasiseenheidNaam | WijkteamNaam | WaterschaapNaam | WijkNaam | XCoordinate | YCoordinate | PartijID | Volgnummer_Partijen | Doorrijder | ObjectType | ObjectType_Anders | VoertuigVerzekerd | Schade | AanhangenWagen | GevaarlijkStoffenPlaat | VoertuigVerlichting | AantalPassagiers | Geslacht_BstuurdrOfVoetgngr | RibewijsGeldg | RijbewijsCategry | RibewijsBeginnr | BromfietsCertificate | VastgelegdePlaats1 | VastgelegdePlaats2 | VastgelegdePlaats_An | VoorgenomenBeweging | AangrijppuntType | Aangrijppunt1 | Aangrijppunt2 | Beweging1 | Beweging2 | Beweging_An | Inrichting | EersteToelatingDatum | MassaLeegVoertuig | Breedte | Lengte | APKGekeurd | Wegvak_BeginDat | Wegvak_EndDat | WegBeheerder | WegNummer | WegdeelLetter | HectoLetter | Baansubsoort | RelatievePositie | WegvakRichting | StraatType | StraatNaam | WoonplaatsNaam | LinkerkantHuisNummer | RechterKantHuisNummer | EerstHuisNummerLinks | EerstHuisNummerRechts | LaatsHuisNummerLinks | LaatsHuisNummerRechts | KlokAanduingAlsBegin | KlokAanduingAlsEnd | SamengesteldLocatieType_Wegvak | SamengesteldLocatieNummer_Wegvak | BeginpuntAfstand | EndpuntAfstand | BegKM | EndKM | WegbeheerderSortRijk | WegbeheerderSortProvincie | WegbeheerderSortGemeente | WegbeheerderSortWaterSchap | WegbeheerderSortOverig | AantalWegvakkenMetJunctie | ActueellWegvakkenMetJunctie | SamengesteldLocatieType_Junctie | SamengesteldLocatieNummer_Junctie | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3.364700 | 51.313763 | 120041872038 | 04-176360 | NaN | Uitsluitend materiele schade | 2 | Vast voorwerp | Ongeval gekoppeld op straat niveau | Rechte weg | NaN | Binnen | 30.0 | NaN | Klinkers | NaN | Droog | NaN | Daglicht | NaN | Droog | NaN | NaN | NaN | NaN | NaN | Parkeervoorziening | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 27142016.0 | NaN | NaN | 1714 | Sluis Z | Zeeland | NaN | REGIO ZEELAND | NaN | NaN | NaN | NaN | 14002.265 | 371343.44 | NaN | NaN | NaN | Overig vast object | NaN | NaN | NaN | NaN | NaN | Niet aanwezig/nvt | NaN | NaN | NaN | NaN | NaN | NaN | Trottoir/berm | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 20030501.0 | NaN | G | NaN | NaN | NaN | NaN | NaN | NaN | PTT-straat | Sint Annastraat | SLUIS | E | NaN | 126.0 | NaN | 126.0 | NaN | 3.0 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 4.310351 | 51.527046 | 120041856048 | 04-177712 | NaN | Letsel | 1 | Eenzijdig | Ongeval exact gekoppeld aan BN | Bocht | NaN | Buiten | 80.0 | Niet aanwezig | Overig asfalt | NaN | Droog | NaN | Daglicht | NaN | Droog | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 159187001.0 | NaN | NaN | 748 | Bergen op Zoom | Noord-Brabant | REGIO NOORD BRABANT-OV | REGIO MIDDEN EN WEST-BRABANT | TEAM HALSTEREN/BERGEN OP ZOOM | NaN | NaN | NaN | 80273.516 | 393669.47 | 21102277.0 | 1.0 | NaN | Motor | NaN | J | J | NaN | NaN | Onbekend | NaN | M | J | A1 | NaN | NaN | Rijbaan | NaN | NaN | Vooruit | V | Rechterflank | Midden voor | Stilstand | Uitrollen | NaN | geen zijspan toegestaan | 19960510.0 | 210.0 | NaN | NaN | NaN | 19980216.0 | NaN | G | NaN | NaN | NaN | NaN | NaN | NaN | PTT-straat | Schansbaan | BERGEN OP ZOOM | NaN | NaN | NaN | NaN | NaN | NaN | 4.0 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 4.048437 | 51.499148 | 120041879850 | 04-186040 | NaN | Uitsluitend materiele schade | 2 | Kop/staart | Ongeval exact gekoppeld aan BN | Rechte weg | NaN | Binnen | 50.0 | Niet brandend | Overig asfalt | NaN | Droog | NaN | Daglicht | NaN | Droog | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 123181012.0 | NaN | 2A | 703 | Reimerswaal | Zeeland | NaN | REGIO ZEELAND | NaN | NaN | REIMERSWAAL | NaN | 62042.676 | 390867.44 | NaN | NaN | NaN | Personenauto | NaN | J | J | NaN | NaN | Onbekend | NaN | NaN | J | B | NaN | NaN | Trottoir/berm | NaN | NaN | Achteruit | V | Rechts voor | NaN | NaN | NaN | ACHTERUIT | combinatiewagen | 19991111.0 | 1405.0 | NaN | NaN | J | 20030201.0 | NaN | G | NaN | NaN | NaN | NaN | NaN | NaN | PTT-straat | Industrieweg | YERSEKE | E | O | 2.0 | 1.0 | 4.0 | 11.0 | 2.0 | 8.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 3.364700 | 51.313763 | 120041871785 | 04-178227 | NaN | Uitsluitend materiele schade | 2 | Vast voorwerp | Ongeval gekoppeld op straat niveau | Rechte weg | NaN | Binnen | 50.0 | NaN | Overig asfalt | NaN | NaN | NaN | Daglicht | NaN | Onbekend | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 27142016.0 | NaN | NaN | 1714 | Sluis Z | Zeeland | NaN | REGIO ZEELAND | NaN | NaN | NaN | NaN | 14002.265 | 371343.44 | NaN | NaN | J | Onbekend voertuig i.g.v. doorrijder | NaN | NaN | NaN | NaN | NaN | Onbekend | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 20030501.0 | NaN | G | NaN | NaN | NaN | NaN | NaN | NaN | PTT-straat | Sint Annastraat | SLUIS | E | NaN | 126.0 | NaN | 126.0 | NaN | 3.0 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 3.371984 | 51.313218 | 120041921852 | 04-209663 | NaN | Uitsluitend materiele schade | 2 | Vast voorwerp | Ongeval exact gekoppeld aan BN | Rechte weg | NaN | Buiten | 50.0 | Brandend | Beton | NaN | Nat | NaN | Daglicht | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Parkeervoorziening | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 28142013.0 | NaN | NaN | 1714 | Sluis Z | Zeeland | NaN | REGIO ZEELAND | NaN | NaN | NaN | NaN | 14508.313 | 371268.75 | NaN | NaN | NaN | Overig vast object | NaN | NaN | NaN | NaN | NaN | Niet aanwezig/nvt | NaN | NaN | NaN | NaN | NaN | NaN | Trottoir/berm | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 20030501.0 | NaN | G | NaN | NaN | NaN | NaN | NaN | NaN | PTT-straat | Industrieweg | SLUIS | NaN | NaN | NaN | NaN | NaN | NaN | 12.0 | 7.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
As you can see the dataset contains many features. Lets take a better look at it.
print('Shape of the dataset: {}'.format(df_chaos04.shape))
Shape of the dataset: (146813, 116)
This is a relatively large dataset. Lets display all the features it contains.
df_chaos04.info(verbose=True, show_counts=True)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 146813 entries, 0 to 146812 Data columns (total 116 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Longitude 146813 non-null float64 1 Latitude 146813 non-null float64 2 OngevalID 146813 non-null int64 3 Communicatie_Ref 146802 non-null object 4 ProcesverbaalOpgem 25323 non-null object 5 Afloop3 146813 non-null object 6 AantalPartijen 146813 non-null int64 7 Aard 146813 non-null object 8 GekoppledNiveau 146813 non-null object 9 Wegsituatie 145867 non-null object 10 WegsituatieAnders 112 non-null object 11 BebouwdeKom 146204 non-null object 12 AangegevenMaxSnelheid 143528 non-null float64 13 Wegverlichting 137412 non-null object 14 Wegverharding 143820 non-null object 15 Wegverharding_Anders 1268 non-null object 16 Wegdekken 138446 non-null object 17 Wegdekken_Anders 1034 non-null object 18 Lichtgesteldheid 146480 non-null object 19 Zichtafstand 28 non-null object 20 Weergesteldheid1 146383 non-null object 21 Weergesteldheid2 472 non-null object 22 BijzonderheidTypeVerkeersmaatregel1 33013 non-null object 23 BijzonderheidTypeVerkeersmaatregel2 7326 non-null object 24 BijzonderheidTypeVerkeersmaatregel3 1435 non-null object 25 BijzonderheidTypeVerkeersmaatregel_An 4511 non-null object 26 BijzonderheidTypeInfrastructuur1 13145 non-null object 27 BijzonderheidTypeInfrastructuur2 184 non-null object 28 BijzonderheidTypeInfrastructuur3 3 non-null object 29 BijzonderheidTypeInfrastructuur_An 8165 non-null object 30 BijzonderheidTidelijkAard1 7064 non-null object 31 BijzonderheidTidelijkAard2 424 non-null object 32 BijzonderheidTidelijkAard3 14 non-null object 33 BijzonderheidTidelijkAard_An 487 non-null object 34 JunctieID 59882 non-null float64 35 WegvakID 86931 non-null float64 36 Hectometer 17667 non-null float64 37 Huisnummer 8034 non-null object 38 GemeenteID 146813 non-null int64 39 GemeenteNaam 146813 non-null object 40 Provincie 146813 non-null object 41 KaderwetgebiedNaam 124336 non-null object 42 PolitieDistrictNaam 146813 non-null object 43 BasiseenheidNaam 65210 non-null object 44 WijkteamNaam 62951 non-null object 45 WaterschaapNaam 19142 non-null object 46 WijkNaam 10237 non-null object 47 XCoordinate 146813 non-null float64 48 YCoordinate 146813 non-null float64 49 PartijID 11374 non-null float64 50 Volgnummer_Partijen 11374 non-null float64 51 Doorrijder 17274 non-null object 52 ObjectType 145183 non-null object 53 ObjectType_Anders 1359 non-null object 54 VoertuigVerzekerd 85614 non-null object 55 Schade 84608 non-null object 56 AanhangenWagen 1900 non-null object 57 GevaarlijkStoffenPlaat 2 non-null object 58 VoertuigVerlichting 146813 non-null object 59 AantalPassagiers 0 non-null float64 60 Geslacht_BstuurdrOfVoetgngr 10976 non-null object 61 RibewijsGeldg 89078 non-null object 62 RijbewijsCategry 88185 non-null object 63 RibewijsBeginnr 5 non-null object 64 BromfietsCertificate 774 non-null object 65 VastgelegdePlaats1 133428 non-null object 66 VastgelegdePlaats2 7 non-null object 67 VastgelegdePlaats_An 2328 non-null object 68 VoorgenomenBeweging 109286 non-null object 69 AangrijppuntType 102904 non-null object 70 Aangrijppunt1 101646 non-null object 71 Aangrijppunt2 5133 non-null object 72 Beweging1 101159 non-null object 73 Beweging2 785 non-null object 74 Beweging_An 362 non-null object 75 Inrichting 84249 non-null object 76 EersteToelatingDatum 86983 non-null float64 77 MassaLeegVoertuig 86096 non-null float64 78 Breedte 14425 non-null float64 79 Lengte 2432 non-null float64 80 APKGekeurd 66269 non-null object 81 Wegvak_BeginDat 86931 non-null float64 82 Wegvak_EndDat 4357 non-null float64 83 WegBeheerder 86931 non-null object 84 WegNummer 26353 non-null object 85 WegdeelLetter 26353 non-null object 86 HectoLetter 1484 non-null object 87 Baansubsoort 27420 non-null object 88 RelatievePositie 28303 non-null object 89 WegvakRichting 19171 non-null object 90 StraatType 86931 non-null object 91 StraatNaam 86931 non-null object 92 WoonplaatsNaam 86931 non-null object 93 LinkerkantHuisNummer 43482 non-null object 94 RechterKantHuisNummer 43923 non-null object 95 EerstHuisNummerLinks 31888 non-null float64 96 EerstHuisNummerRechts 32457 non-null float64 97 LaatsHuisNummerLinks 31888 non-null float64 98 LaatsHuisNummerRechts 32457 non-null float64 99 KlokAanduingAlsBegin 86931 non-null float64 100 KlokAanduingAlsEnd 86931 non-null float64 101 SamengesteldLocatieType_Wegvak 10749 non-null object 102 SamengesteldLocatieNummer_Wegvak 10749 non-null float64 103 BeginpuntAfstand 26321 non-null float64 104 EndpuntAfstand 26321 non-null float64 105 BegKM 26321 non-null float64 106 EndKM 26321 non-null float64 107 WegbeheerderSortRijk 59882 non-null object 108 WegbeheerderSortProvincie 59882 non-null object 109 WegbeheerderSortGemeente 59882 non-null object 110 WegbeheerderSortWaterSchap 59882 non-null object 111 WegbeheerderSortOverig 59882 non-null object 112 AantalWegvakkenMetJunctie 59882 non-null float64 113 ActueellWegvakkenMetJunctie 59882 non-null object 114 SamengesteldLocatieType_Junctie 20587 non-null object 115 SamengesteldLocatieNummer_Junctie 20587 non-null float64 dtypes: float64(30), int64(3), object(83) memory usage: 129.9+ MB
From here we can actually get more insight into usable feature sets. These features can be split into usable groups.
Because the nature of this research is traffic incidents, we have to know the cause. Lets take a deeper look into Aard
cause_of_incident = list(set(df_chaos04['Aard']))
cause_of_incident
['Dier', 'Eenzijdig', 'Onbekend', 'Kop/staart', 'Frontaal', 'Voetganger', 'Los voorwerp', 'Flank', 'Geparkeerd voertuig', 'Vast voorwerp']
Here we can see the 10 main causes, of which the final one is of type unknown. Lets take a look at the distribution. But first checking for null values.
df_chaos04['Aard'].isnull().sum()
0
Luckily every datapoint has an established type. Lets look at the distribution.
fig, ax = plt.subplots(1, figsize=(15,10))
ax.set_title('Nature Of Traffic Incidents (The Netherlands, 2004)')
ax.set_xlabel(' Type ')
ax.set_ylabel(' Count ')
ax = df_chaos04['Aard'].value_counts().head(10).plot(kind='bar')
Here we can see that Flank is the common cause for traffic incidents. This translates into accidents from the side
Another thing to note is there is a clear distinction between Kop/Staart, Frontaal and Eenzijdig. It is curious to see on what base this is established.
Kop/Staart: Frontal or rear ended crash
Frontaal: Head to head crash
Eenzijdig: Uni directional crash
An interesting combination would be to check the damages related to the type of accident.
First lets check for null values in Schade
df_chaos04['Schade'].isnull().sum()
62205
There are some, about 30% of the total. Lets check the types.
df_chaos04['Schade'].value_counts()
J 77371 N 7237 Name: Schade, dtype: int64
In this case it seems very clear there is a distinction between 3 types. Yes, No and NaN.
First lets change all NaN to O and then lets incorporate this data into the previous chart.
df_snan = df_chaos04.copy(deep=True)
df_snan['Schade'] = df_snan['Schade'].replace(np.nan, 'O')
fig, axes = plt.subplots(1, 2, figsize=(15, 12), sharey=True)
df_chaos04.groupby(['Aard', 'Schade']).size().unstack().plot(kind='bar', stacked=True, ax=axes[0])
df_snan.groupby(['Aard', 'Schade']).size().unstack().plot(kind='bar', stacked=True, ax=axes[1])
axes[0].set_title('With NaN Damages')
axes[1].set_title('Replaced Unknown Damages')
plt.show()
This is a very important insight for the dataset we are dealing with, At first glance it seems that obviously almost every accident involves some kind of damage. But after converting the NaN datapoints we can tell that it is not always the case!
Vast voorwerp (Solid object) is the perfect example for this. The law enforcement officers in many cases cannot note the damages done to the vehicle because they escape the crime scene in these situations.
Lets also take a look at gender, and if this has any influence
df_chaos04['Geslacht_BstuurdrOfVoetgngr'].value_counts()
M 6392 V 4584 Name: Geslacht_BstuurdrOfVoetgngr, dtype: int64
df_chaos04['Geslacht_BstuurdrOfVoetgngr'].isnull().sum()
135837
Lets make a copy of the dataset, change the NaN to O and compare the two results in a graph.
df_gnan = df_chaos04.copy(deep=True)
df_gnan['Geslacht_BstuurdrOfVoetgngr'] = df_gnan['Geslacht_BstuurdrOfVoetgngr'].replace(np.nan, 'O')
fig, axes = plt.subplots(1, 2, figsize=(15, 12), sharey=True)
df_chaos04.groupby(['Aard', 'Geslacht_BstuurdrOfVoetgngr']).size().unstack().plot(kind='bar', color=['b', 'g'], stacked=False, ax=axes[0])
df_gnan.groupby(['Aard', 'Geslacht_BstuurdrOfVoetgngr']).size().unstack().plot(kind='bar', color=['b', 'r', 'g'], stacked=False, ax=axes[1])
axes[0].set_title('With NaN Gender')
axes[1].set_title('Replaced Unknown Gender')
plt.show()
From this we can assume that this data would not be feasible to use in these circumstances. Too many datapoints are unknown to make an accurate assumption.
Finally lets take a closer look at the final conclusions from these accidents. They can be found within Afloop3
df_chaos04['Afloop3'].value_counts()
Uitsluitend materiele schade 119053 Letsel 27013 Dodelijk 747 Name: Afloop3, dtype: int64
df_chaos04['Afloop3'].isnull().sum()
0
As you can see all accidents came to a conclusive report of the situation. Lets take a look at this.
df_chaos04.groupby(['Aard', 'Afloop3']).size().unstack().plot(kind='bar', color=['r', 'orange', 'g'], stacked=False, figsize=(15,10), title='Resulting damages per accident')
plt.show()
We can see that luckily only a small part of each case ends in a dealdy situation. It is interesting to see what difference there was.
It is possible to plot all datapoints into a map. But it is not feasible, because we know for sure these incidents happen all around the netherlands, and it will take quite some computing power to display all 160k datapoints.
Therefore we will make use of Longitude and Latitude for later comparison. Lets focus on data that can be classified for now. Lets see the provinces and their relative cases of traffic incidents.
df_chaos04['Provincie'].isnull().sum()
0
No null values so we can continue by displaying the categorical data.
from pandas.plotting import table
fig, axes = plt.subplots(1, 2, figsize=(10, 10), sharey=True)
#table 1
df_chaos04.groupby('Provincie').size().plot(kind='pie',
colors=['#488f31', '#729d42', '#96ab58', '#b5b970', '#d1c88c', '#ead8a9', '#ffeac8'
, '#f7d1a8', '#f1b78c', '#eb9c74', '#e58063', '#de6157'], ax=axes[0])
axes[0].set_title('Distribution of incidents per Province')
axes[0].set_ylabel('')
#table 2
plt.axis('off')
tbl = table(axes[1], df_chaos04['Provincie'].value_counts(), loc='right')
tbl.auto_set_font_size(False)
tbl.set_fontsize(12)
plt.show()
It is interesting to see that in the pie chart None is visible, while in the table it is not present. Lets investigate this further.
df_chaos04['Provincie'][df_chaos04['Provincie'] == 'None'].count()
0
So it seems like there are no None, we dont need further cleaning for this. Lets continue and take a look at the type of accidents per province.
df_chaos04.groupby(['Provincie', 'Aard']).size().unstack().plot(kind='bar',
color=['#003f5c', '#2f4b7c', '#665191', '#a05195', '#d45087'
, '#f95d6a', '#ff7c43', '#ffa600', '#f1bca0', '#f1f6c0'], stacked=True, figsize=(15,10), title='Type of accident per Province')
plt.show()
Here we can clearly see that there are 3 main reasons:
This is relatable to accidents seen daily.
First lets take a look at the type of vehicles there are.
df_chaos04['ObjectType'].value_counts()
Personenauto 77529 Bestelauto 10710 Onbekend voertuig i.g.v. doorrijder 9449 Overig vast object 8746 Overig wegmeubilair 6894 Fiets 5584 Lichtmast 4299 Boom 3661 Bromfiets 3382 Vrachtauto 3085 Dier 2684 Motor 2087 Trekker 1475 Los voorwerp 1311 Bus 1109 Snorfiets 899 Voetganger 872 Trekker met oplegger 428 Landbouwvoertuig 398 Brommobiel 344 Trein/tram 237 Name: ObjectType, dtype: int64
With this we can create a new dataset about the vehicle
df_vehicle = df_chaos04.copy(deep=True)
df_vehicle = df_vehicle[['ObjectType', 'ObjectType_Anders', 'Inrichting', 'EersteToelatingDatum', 'MassaLeegVoertuig', 'Breedte', 'Lengte', 'APKGekeurd']]
df_vehicle.head()
| ObjectType | ObjectType_Anders | Inrichting | EersteToelatingDatum | MassaLeegVoertuig | Breedte | Lengte | APKGekeurd | |
|---|---|---|---|---|---|---|---|---|
| 0 | Overig vast object | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | Motor | NaN | geen zijspan toegestaan | 19960510.0 | 210.0 | NaN | NaN | NaN |
| 2 | Personenauto | NaN | combinatiewagen | 19991111.0 | 1405.0 | NaN | NaN | J |
| 3 | Onbekend voertuig i.g.v. doorrijder | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | Overig vast object | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Here we can already see our first problem if we want to incorporate the date of the vehicle. Lets see how many of these entries are empty.
df_vehicle['EersteToelatingDatum'].isnull().sum()
59830
This is about 30 percent of the dataset. We will keep as is because this might be to the fact that the vehicle escaped the scene and there is no information about it. To see a distribution of we will have to change the date time.
df_vehicle['EersteToelatingDatum'][79931]
20020920.0
Here we can see the format of yyyymmdd
df_vehicle['EersteToelatingDatum'] = pd.to_datetime(df_vehicle['EersteToelatingDatum'], format='%Y%m%d')
df_vehicle['EersteToelatingDatum'][79931]
Timestamp('2002-09-20 00:00:00')
temp_dates = pd.DataFrame(df_vehicle['EersteToelatingDatum'].value_counts().rename_axis('EersteToelatingDatum').reset_index(name='Total'))
df_date_total = temp_dates.set_index('EersteToelatingDatum')
df_date_total
| Total | |
|---|---|
| EersteToelatingDatum | |
| 1992-01-02 | 156 |
| 2000-01-03 | 143 |
| 2004-01-02 | 121 |
| 1994-01-03 | 118 |
| 1998-01-02 | 109 |
| ... | ... |
| 1988-12-03 | 1 |
| 1981-06-26 | 1 |
| 1971-03-05 | 1 |
| 2004-11-01 | 1 |
| 1983-09-14 | 1 |
6384 rows × 1 columns
cars_per_date = df_vehicle.groupby(['EersteToelatingDatum']).count()
del cars_per_date['Inrichting']
del cars_per_date['MassaLeegVoertuig']
del cars_per_date['Breedte']
del cars_per_date['Lengte']
df_total_car = pd.concat([df_date_total, cars_per_date], axis=1)
df_total_car
| Total | ObjectType | ObjectType_Anders | APKGekeurd | |
|---|---|---|---|---|
| EersteToelatingDatum | ||||
| 1920-06-30 | 1 | 1 | 0 | 0 |
| 1934-09-15 | 1 | 1 | 0 | 1 |
| 1941-06-30 | 1 | 1 | 0 | 0 |
| 1942-06-30 | 1 | 1 | 0 | 0 |
| 1942-08-01 | 1 | 1 | 0 | 1 |
| ... | ... | ... | ... | ... |
| 2005-06-29 | 1 | 1 | 0 | 0 |
| 2005-07-04 | 1 | 1 | 0 | 0 |
| 2005-08-02 | 1 | 1 | 0 | 0 |
| 2005-12-15 | 1 | 1 | 0 | 0 |
| 2006-10-24 | 1 | 1 | 0 | 0 |
6384 rows × 4 columns
df_total_car.plot(figsize=(15,10), color=['#3d708f', '#88a037'], title='Vehicles involved in accidents per import year')
plt.show()
Here we can see a surprising increase of not appropriately registered vehicles increased significantly. This could have been an influx of foreign vehicles, but it is not sure for now.
As we can see there is some usable information in this dataset, but lets take a look at a different one to compare the usable features.
This dataset is comprised of 2 main sets, and references. We will focus on the accidents first to see if the dataset is of any use. Lets import it.
df_acc15 = pd.read_csv('/Users/Matt/Desktop/AI/CHALLENGE1/Ongevallen2/ongevallen.txt',
low_memory=0,
sep=','
#header=0,
#index_col=0)
).iloc[:]
df_acc15.head()
| VKL_NUMMER | REGNUMMER | PVOPGEM | DATUM_VKL | DAG_CODE | MND_NUMMER | JAAR_VKL | TIJDSTIP | UUR | DDL_ID | AP3_CODE | AP4_CODE | AP5_CODE | ANTL_SLA | ANTL_DOD | ANTL_GZH | ANTL_SEH | ANTL_GOV | ANTL_PTJ | ANTL_TDT | MNE_CODE | AOL_ID | NIVEAUKOP | WSE_ID | WSE_AN | BEBKOM | MAXSNELHD | WVL_ID | WVG_ID | WVG_AN | WDK_ID | WDK_AN | LGD_ID | ZAD_ID | WGD_CODE_1 | WGD_CODE_2 | BZD_ID_VM1 | BZD_ID_VM2 | BZD_ID_VM3 | BZD_VM_AN | BZD_ID_IF1 | BZD_ID_IF2 | BZD_ID_IF3 | BZD_IF_AN | BZD_ID_TA1 | BZD_ID_TA2 | BZD_ID_TA3 | BZD_TA_AN | JTE_ID | WVK_ID | HECTOMETER | FK_VELD5 | HUISNUMMER | GME_ID | GME_NAAM | PVE_CODE | PVE_NAAM | KDD_NAAM | PLT_NAAM | DIENSTCODE | DIENSTNAAM | DISTRCODE | DISTRNAAM | DAGTYPE | IND_ALC | WEEKNR | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 20150066102 | 2015298854 | NaN | NaN | NaN | NaN | 2015 | NaN | NaN | NaN | UMS | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3 | NaN | NaN | 0 | K | 5.0 | NaN | BU | 50.0 | 2.0 | 1.0 | NaN | 1.0 | NaN | NaN | NaN | D | NaN | NaN | NaN | NaN | NaN | 280.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 169315140.0 | NaN | NaN | JTE0169315140 | NaN | 518 | 's-Gravenhage | ZH | Zuid-Holland | STADSGEWEST HAAGLANDEN | Regionale Eenheid Den Haag | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 20150066103 | 2015328266 | NaN | NaN | NaN | NaN | 2015 | NaN | NaN | NaN | UMS | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 4 | NaN | NaN | 0 | K | 5.0 | NaN | BI | 70.0 | 1.0 | 2.0 | NaN | 2.0 | NaN | NaN | NaN | R | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 211321184.0 | NaN | NaN | JTE0211321184 | NaN | 484 | Alphen aan den Rijn | ZH | Zuid-Holland | REGIO ZUID HOLLAND-OV | Regionale Eenheid Den Haag | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 20150066104 | 2015329512 | NaN | NaN | NaN | NaN | 2015 | NaN | NaN | NaN | UMS | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | NaN | NaN | 9 | E | 1.0 | NaN | BI | 60.0 | 3.0 | 2.0 | NaN | 2.0 | NaN | NaN | NaN | D | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 219288005.0 | NaN | WVK0219288005 | 12.0 | 1931 | Krimpenerwaard | ZH | Zuid-Holland | REGIO ZUID HOLLAND-OV | Regionale Eenheid Den Haag | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 20150066105 | 2015332880 | NaN | NaN | NaN | NaN | 2015 | NaN | NaN | NaN | LET | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3 | NaN | NaN | 0 | E | 1.0 | NaN | BI | 50.0 | 1.0 | NaN | NaN | 2.0 | NaN | NaN | NaN | R | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 159308166.0 | NaN | WVK0159308166 | 146.0 | 518 | 's-Gravenhage | ZH | Zuid-Holland | STADSGEWEST HAAGLANDEN | Regionale Eenheid Den Haag | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 20150066106 | 2015325816 | J | NaN | NaN | NaN | 2015 | NaN | NaN | NaN | LET | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | NaN | NaN | 9 | E | 5.0 | NaN | BI | 50.0 | 2.0 | 2.0 | NaN | 1.0 | NaN | NaN | NaN | D | NaN | 110.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 165310116.0 | NaN | NaN | JTE0165310116 | NaN | 518 | 's-Gravenhage | ZH | Zuid-Holland | STADSGEWEST HAAGLANDEN | Regionale Eenheid Den Haag | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
This data has been tagged. In the downloaded file structure there are referenced features. We can easily pick some out and see what data we are dealing with. First lets get a better look of the shape
print('Shape of dataset: {}'.format(df_acc15.shape))
Shape of dataset: (113289, 66)
Good to see we have another potentially satisfying dataset. These features will be my main point of interest:
AOL_ID to AOL_OMSAP3_CODE to AP3_OMSAP4_CODE to AP4_OMSAP5_CODE to AP5_OMSDDL_ID to DDL_OMSDAG_CODE to DAG_NAAMLKF_ID to LKF_OMSLKG_ID to LKG_OMSMND_NUMMER to MND_NAAMMNE_CODE to MNE_OMSNTT_CODE to NTT_OMSOTE_ID to OTE_OMSTDT_ID to TDT_OMSWSE_ID to WSE_OMSWVG_ID to WVG_OMS
Lets check if any of this data is actually usable by checking for null values.def return_null_count(dataframe):
print(dataframe+ ' : {}'.format(df_acc15[dataframe].isnull().sum()))
return_null_count('AOL_ID')
return_null_count('AP3_CODE')
return_null_count('AP4_CODE')
return_null_count('AP5_CODE')
return_null_count('DDL_ID')
return_null_count('DAG_CODE')
return_null_count('MND_NUMMER')
return_null_count('MNE_CODE')
return_null_count('WSE_ID')
return_null_count('WVG_ID')
#Present in partijen.txt dataset
#return_null_count('OTE_ID')
#Not present in dataset
#return_null_count('NTT_CODE')
#return_null_count('OTE_ID')
#return_null_count('TDT_ID')
#return_null_count('LKF_ID')
#return_null_count('LKG_ID')
AOL_ID : 0 AP3_CODE : 0 AP4_CODE : 113289 AP5_CODE : 113289 DDL_ID : 113289 DAG_CODE : 113289 MND_NUMMER : 113289 MNE_CODE : 113289 WSE_ID : 34829 WVG_ID : 36894
From this we can see that not many of the features are actually usable because they are null. Only usable features are:
AOL_IDAP3_CODEWSE_IDWVG_IDdf_aol = pd.read_csv('/Users/Matt/Desktop/AI/CHALLENGE1/Ongevallen2/ref/aardongevallen.txt',
low_memory=0,
sep=',',
header=0,
index_col=0
).iloc[:]
df_aol.head(10)
df_aol.sort_index(axis=0)
| AOL_OMS | |
|---|---|
| AOL_ID | |
| 0 | Onbekend |
| 1 | Voetganger |
| 2 | Geparkeerd voertuig |
| 3 | Dier |
| 4 | Vast voorwerp |
| 5 | Los voorwerp |
| 6 | Frontaal |
| 7 | Flank |
| 8 | Kop/staart |
| 9 | Eenzijdig |
df_afl = pd.read_csv('/Users/Matt/Desktop/AI/CHALLENGE1/Ongevallen2/ref/aflopen3.txt',
low_memory=0,
sep=','
#header=0,
#index_col=0)
).iloc[:]
df_afl.head()
| AP3_CODE | AP3_OMS | |
|---|---|---|
| 0 | DOD | Dodelijk |
| 1 | LET | Letsel |
| 2 | UMS | Uitsluitend materiele schade |
df_wse = pd.read_csv('/Users/Matt/Desktop/AI/CHALLENGE1/Ongevallen2/ref/wegsituaties.txt',
low_memory=0,
sep=','
#header=0,
#index_col=0)
).iloc[:]
#df_wse.head()
df_wvg = pd.read_csv('/Users/Matt/Desktop/AI/CHALLENGE1/Ongevallen2/ref/wegverhardingen.txt',
low_memory=0,
sep=','
#header=0,
#index_col=0)
).iloc[:]
#df_wvg.head()
Now lets seperate these columns from the main dataframe
df_select15 = df_acc15[['AOL_ID', 'AP3_CODE', 'WSE_ID', 'WVG_ID']]
df_select15
| AOL_ID | AP3_CODE | WSE_ID | WVG_ID | |
|---|---|---|---|---|
| 0 | 0 | UMS | 5.0 | 1.0 |
| 1 | 0 | UMS | 5.0 | 2.0 |
| 2 | 9 | UMS | 1.0 | 2.0 |
| 3 | 0 | LET | 1.0 | NaN |
| 4 | 9 | LET | 5.0 | 2.0 |
| ... | ... | ... | ... | ... |
| 113284 | 0 | UMS | 5.0 | 2.0 |
| 113285 | 0 | UMS | NaN | NaN |
| 113286 | 0 | UMS | NaN | NaN |
| 113287 | 0 | UMS | NaN | NaN |
| 113288 | 0 | UMS | NaN | NaN |
113289 rows × 4 columns
df_select15['AOL_ID'].replace([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], ['Onbekend', 'Voetganger', 'Geparkeerd voertuig', 'Dier', 'Vast voorwerp'
, 'Los voorwerp', 'Frontaal', 'Flank', 'Kop/staart', 'Eenzijdig'], inplace=True)
df_select15
C:\Users\Matt\anaconda3\lib\site-packages\pandas\core\series.py:4509: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy return super().replace(
| AOL_ID | AP3_CODE | WSE_ID | WVG_ID | |
|---|---|---|---|---|
| 0 | Onbekend | UMS | 5.0 | 1.0 |
| 1 | Onbekend | UMS | 5.0 | 2.0 |
| 2 | Eenzijdig | UMS | 1.0 | 2.0 |
| 3 | Onbekend | LET | 1.0 | NaN |
| 4 | Eenzijdig | LET | 5.0 | 2.0 |
| ... | ... | ... | ... | ... |
| 113284 | Onbekend | UMS | 5.0 | 2.0 |
| 113285 | Onbekend | UMS | NaN | NaN |
| 113286 | Onbekend | UMS | NaN | NaN |
| 113287 | Onbekend | UMS | NaN | NaN |
| 113288 | Onbekend | UMS | NaN | NaN |
113289 rows × 4 columns
fig, ax = plt.subplots(1, figsize=(15,10))
ax.set_title('Nature Of Traffic Incidents (The Netherlands, 2015)')
ax.set_xlabel(' Type ')
ax.set_ylabel(' Count ')
ax = df_select15['AOL_ID'].value_counts().head(10).plot(kind='bar')
From this we can tell alot of the data is onbekend a.k.a. unknown. Sadly we will not be able to use this dataset because of its lack of correlatable features.
The third dataset contains information about each traffic related death in the Netherlands between 2006 and 2012. Lets import the dataset and see what usable features we can pull out.
df_deadly = pd.read_csv('/Users/Matt/Desktop/AI/CHALLENGE1/dodelijke-verkeersongevallen.csv',
low_memory=0,
sep=','
#header=0,
#index_col=0)
).iloc[:]
df_deadly.head()
| Jaar | LEEFTIJD | GESLACHT | vervoerwijze | vervoerwijzefilter | wegvak/kruispunt | wegbeheerder | wegnummer | HECTOMETER | STRAAT1 | STRAAT2 | STRAAT3 | GEMEENTE | GEMEENTE 2013 | PROVINCIE | type locatie | X_COORD | Y_COORD | latitude | longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2006 | 67 | M | Personenauto | 1.0 | 32151001 | NaN | NaN | NaN | Retranchementseweg | Bosweg | GROENE WEG | Sluis Z | Sluis Z | Zeeland | W | 16483 | 375782 | 51.354251 | 3.398533 |
| 1 | 2006 | 9 | M | Personenauto | 1.0 | 47133006 | NaN | NaN | NaN | Buitendijk | Buitendijk | Galgenstraat | Sluis Z | Sluis Z | Zeeland | K | 23959 | 366700 | 51.274434 | 3.509164 |
| 2 | 2006 | 56 | M | Personenauto | 1.0 | 53155002 | NaN | NaN | 178,8 | Rijksweg | Middendijk | Schoondijkseweg | Sluis Z | Sluis Z | Zeeland | W | 26789 | 377298 | 51.370304 | 3.545870 |
| 3 | 2006 | 0 | V | Personenauto | 1.0 | 60143043 | NaN | NaN | 0,6 | Middenweg | Oranjestraat | Turkeijeweg | Sluis Z | Sluis Z | Zeeland | K | 30224 | 371642 | 51.320253 | 3.597172 |
| 4 | 2006 | 58 | V | Fiets | 2.0 | 61175056 | NaN | NaN | NaN | Bermweg | Kanaalstraat | Spoorstraat | Vlissingen | Vlissingen | Zeeland | K | 30524 | 387701 | 51.464611 | 3.595816 |
df_deadly.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4402 entries, 0 to 4401 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Jaar 4402 non-null int64 1 LEEFTIJD 4402 non-null int64 2 GESLACHT 4402 non-null object 3 vervoerwijze 4402 non-null object 4 vervoerwijzefilter 4399 non-null float64 5 wegvak/kruispunt 4402 non-null int64 6 wegbeheerder 1105 non-null object 7 wegnummer 394 non-null object 8 HECTOMETER 1092 non-null object 9 STRAAT1 4402 non-null object 10 STRAAT2 4394 non-null object 11 STRAAT3 4391 non-null object 12 GEMEENTE 4402 non-null object 13 GEMEENTE 2013 4402 non-null object 14 PROVINCIE 4402 non-null object 15 type locatie 4402 non-null object 16 X_COORD 4402 non-null int64 17 Y_COORD 4402 non-null int64 18 latitude 4402 non-null float64 19 longitude 4402 non-null float64 dtypes: float64(3), int64(5), object(12) memory usage: 687.9+ KB
df_deadly['type locatie'].value_counts()
W 2947 K 1454 L 1 Name: type locatie, dtype: int64
import plotly.express as px
fig = px.histogram(df_deadly, x='Jaar',labels={
'Jaar':'Year',
'TIME':'Country'},
title='Traffic deaths per year the Netherlands 2006-2012')
fig.update_layout(bargap=0.3)
fig.show()
fig = px.histogram(df_deadly, x='LEEFTIJD',labels={
'LEEFTIJD':'Age',
'count':'Country'},
title='Age distribution of traffic deaths in the Netherlands 2006-2012', nbins=100)
#fig.update_layout(bargap=0.3)
fig.show()
import plotly.graph_objects as go
fig = px.histogram(df_deadly, x='LEEFTIJD', color='vervoerwijze', nbins=100,
labels={
'LEEFTIJD': 'AGE',
'vervoerwijze': 'vehicle type'
},
title='Vehicle distribution in deadly traffic accidents by age in the Netherlands 2006-2012')
fig.show()
fig = px.pie(df_deadly, names='PROVINCIE',
title='Distribution of deadly traffic accidents by province in the Netherlands 2006-2012')
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.show()
Here we have a very nice and saturated dataset we can use. Lets display this data in a nice form. Starting with the age.
fig, ax = plt.subplots(1, figsize=(20,10))
ax.set_title('Deadly traffic accidents by age')
ax.set_xlabel(' Age ')
ax.set_ylabel(' Count ')
ax = df_deadly['LEEFTIJD'].plot(kind='hist', bins=100)
df_total_deaths_eu = pd.read_excel('/Users/Matt/Desktop/AI/CHALLENGE1/deaths_eu_vehicle_cat_2011_2019.xlsx', sheet_name='Sheet 1',
header=8,
index_col=0
).iloc[:]
df_total_deaths_eu.head(5)
df_total_deaths_eu = df_total_deaths_eu.transpose()
df_total_deaths_eu.head(5)
C:\Users\Matt\anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py:221: UserWarning: Workbook contains no default style, apply openpyxl's default
| TIME | GEO (Labels) | European Union - 27 countries (from 2020) | European Union - 28 countries (2013-2020) | Belgium | Bulgaria | Czechia | Denmark | Germany (until 1990 former territory of the FRG) | Estonia | Ireland | Greece | Spain | France | Croatia | Italy | Cyprus | Latvia | Lithuania | Luxembourg | Hungary | Malta | Netherlands | Austria | Poland | Portugal | Romania | Slovenia | Slovakia | Finland | Sweden | Iceland | Liechtenstein | Norway | Switzerland | United Kingdom | Turkey | NaN | Special value | : |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1999 | NaN | 54680 | 58244 | 1397 | 1047 | 1455 | 514 | 7772 | 232 | 414 | 2116 | 5709 | 8462 | 662 | 6688 | 113 | 652 | 748 | 58 | 1306 | 4 | 1090 | 1079 | 6730 | 1972 | 2468 | 334 | 647 | 431 | 580 | 21 | 0 | 304 | 583 | 3564 | : | NaN | NaN | not available |
| 2000 | NaN | 53410 | 56990 | 1470 | 1012 | 1486 | 498 | 7503 | 204 | 418 | 2037 | 5746 | 8059 | 655 | 7061 | 111 | 635 | 641 | 76 | 1200 | 15 | 1082 | 976 | 6294 | 1836 | 2466 | 314 | 628 | 396 | 591 | 32 | 3 | 341 | 592 | 3580 | : | NaN | NaN | NaN |
| 2001 | NaN | 51282 | 54880 | 1486 | 1011 | 1333 | 431 | 6977 | 199 | 412 | 1880 | 5478 | 8136 | 647 | 7096 | 98 | 558 | 706 | 70 | 1239 | 16 | 993 | 958 | 5534 | 1655 | 2450 | 278 | 625 | 433 | 583 | 24 | 2 | 275 | 544 | 3598 | : | NaN | NaN | NaN |
| 2002 | NaN | 50343 | 53924 | 1306 | 959 | 1430 | 463 | 6842 | 223 | 378 | 1634 | 5312 | 7630 | 627 | 6980 | 94 | 559 | 697 | 62 | 1429 | 16 | 987 | 956 | 5826 | 1653 | 2411 | 269 | 625 | 415 | 560 | 29 | 0 | 310 | 513 | 3581 | : | NaN | NaN | NaN |
| 2003 | NaN | 47331 | 50989 | 1213 | 960 | 1447 | 432 | 6613 | 164 | 337 | 1605 | 5373 | 6034 | 701 | 6563 | 97 | 532 | 709 | 53 | 1326 | 16 | 1028 | 931 | 5642 | 1523 | 2229 | 242 | 653 | 379 | 529 | 23 | 5 | 280 | 546 | 3658 | : | NaN | NaN | NaN |
df_passengercar_deaths_eu = pd.read_excel('/Users/Matt/Desktop/AI/CHALLENGE1/deaths_eu_vehicle_cat_2011_2019.xlsx', sheet_name='Sheet 14',
header=8,
index_col=0
).iloc[:]
df_passengercar_deaths_eu = df_passengercar_deaths_eu.transpose()
df_passengercar_deaths_eu.head(5)
C:\Users\Matt\anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py:221: UserWarning: Workbook contains no default style, apply openpyxl's default
| TIME | GEO (Labels) | European Union - 27 countries (from 2020) | European Union - 28 countries (2013-2020) | Belgium | Bulgaria | Czechia | Denmark | Germany (until 1990 former territory of the FRG) | Estonia | Ireland | Greece | Spain | France | Croatia | Italy | Cyprus | Latvia | Lithuania | Luxembourg | Hungary | Malta | Netherlands | Austria | Poland | Portugal | Romania | Slovenia | Slovakia | Finland | Sweden | Iceland | Liechtenstein | Norway | Switzerland | United Kingdom | Turkey | NaN | Special value | : |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1999 | NaN | : | : | 851 | : | 775 | 271 | : | : | 236 | 886 | 3186 | 5454 | : | 3796 | : | : | : | 49 | : | : | 540 | 615 | : | 816 | 914 | : | : | 251 | 372 | : | : | : | : | 1763 | : | NaN | NaN | not available |
| 2000 | NaN | : | : | 922 | : | 784 | 235 | 4396 | : | 262 | 922 | 3285 | 5291 | : | 3850 | : | : | : | 54 | : | : | 513 | 549 | 2710 | 730 | 903 | 126 | : | 224 | 393 | : | : | : | : | 1770 | : | NaN | NaN | NaN |
| 2001 | NaN | : | : | 899 | : | 715 | 242 | 4023 | : | 231 | 803 | 3140 | 5284 | : | 3847 | : | : | : | 51 | : | : | 477 | 570 | 2438 | 638 | 933 | 107 | 304 | 262 | 373 | : | : | 167 | 245 | 1816 | : | NaN | NaN | NaN |
| 2002 | NaN | : | : | 779 | : | 759 | 246 | 4005 | : | 202 | 793 | 3102 | 4862 | : | 3653 | : | : | : | 52 | : | : | 479 | 524 | 2548 | 706 | 874 | 124 | 304 | 267 | 379 | 26 | : | 189 | 274 | 1832 | : | NaN | NaN | NaN |
| 2003 | NaN | : | : | 688 | : | 798 | 236 | 3774 | : | 174 | 761 | 3212 | 3685 | : | 3377 | : | : | : | 33 | 640 | : | 483 | 524 | 2541 | 627 | 856 | 102 | 306 | 217 | 349 | 17 | : | 170 | 260 | 1841 | : | NaN | NaN | NaN |
import plotly.express as px
fig = px.line(df_total_deaths_eu.iloc[:, 3:-3],
labels={'value':'Deaths',
'index':'Year',
'TIME':'Country'},
title='Deaths in traffic related accidents EU 1999-2019 (28 countries)')
fig.show()
fig = px.line(df_total_deaths_eu.iloc[:, 1:3],
labels={'value':'Deaths',
'index':'Year',
'TIME':'Country'},
title='Total deaths in traffic related accidents EU 1999-2019')
fig.show()
fig = px.histogram(df_total_deaths_eu.iloc[:, -18], x=df_total_deaths_eu.index, y='Netherlands' ,labels={
'index':'Year',
'sum of Netherlands':'Deaths'},
title='Traffic deaths per year the Netherlands 1999-2019')
fig.update_layout(bargap=0.3)
fig.show()
df3_ong.info(verbose=True, show_counts=True)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 122051 entries, 0 to 122050 Data columns (total 116 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Longitude 122051 non-null float64 1 Latitude 122051 non-null float64 2 OngevalID 122051 non-null int64 3 Communicatie_Ref 122049 non-null object 4 ProcesverbaalOpgem 23414 non-null object 5 Afloop3 122051 non-null object 6 AantalPartijen 122051 non-null int64 7 Aard 122051 non-null object 8 GekoppledNiveau 122051 non-null object 9 Wegsituatie 122049 non-null object 10 WegsituatieAnders 0 non-null float64 11 BebouwdeKom 121984 non-null object 12 AangegevenMaxSnelheid 118911 non-null float64 13 Wegverlichting 114805 non-null object 14 Wegverharding 120038 non-null object 15 Wegverharding_Anders 1191 non-null object 16 Wegdekken 115844 non-null object 17 Wegdekken_Anders 596 non-null object 18 Lichtgesteldheid 121982 non-null object 19 Zichtafstand 34 non-null object 20 Weergesteldheid1 122032 non-null object 21 Weergesteldheid2 387 non-null object 22 BijzonderheidTypeVerkeersmaatregel1 30026 non-null object 23 BijzonderheidTypeVerkeersmaatregel2 6700 non-null object 24 BijzonderheidTypeVerkeersmaatregel3 307 non-null object 25 BijzonderheidTypeVerkeersmaatregel_An 2502 non-null object 26 BijzonderheidTypeInfrastructuur1 5075 non-null object 27 BijzonderheidTypeInfrastructuur2 38 non-null object 28 BijzonderheidTypeInfrastructuur3 0 non-null float64 29 BijzonderheidTypeInfrastructuur_An 4726 non-null object 30 BijzonderheidTidelijkAard1 6178 non-null object 31 BijzonderheidTidelijkAard2 261 non-null object 32 BijzonderheidTidelijkAard3 3 non-null object 33 BijzonderheidTidelijkAard_An 297 non-null object 34 JunctieID 54454 non-null float64 35 WegvakID 67597 non-null float64 36 Hectometer 18205 non-null float64 37 Huisnummer 11380 non-null object 38 GemeenteID 122051 non-null int64 39 GemeenteNaam 122051 non-null object 40 ProvincieNaam 122051 non-null object 41 KaderwetgebiedNaam 102930 non-null object 42 PolitieDistrictNaam 122051 non-null object 43 BasiseenheidNaam 54506 non-null object 44 WijkteamNaam 53157 non-null object 45 WaterschaapNaam 15148 non-null object 46 WijkNaam 8094 non-null object 47 XCoordinate 122051 non-null float64 48 YCoordinate 122051 non-null float64 49 PartijID 122051 non-null int64 50 Volgnummer 122051 non-null int64 51 Doorrijder 14300 non-null object 52 ObjectType 121198 non-null object 53 ObjectType_Anders 852 non-null object 54 VoertuigVerzekerd 74553 non-null object 55 Schade 94505 non-null object 56 AanhangenWagen 2053 non-null object 57 GevaarlijkStoffenPlaat 16 non-null object 58 VoertuigVerlichting 122051 non-null object 59 AantalPassagiers 0 non-null float64 60 Geslacht_BstuurdrOfVoetgngr 24140 non-null object 61 RibewijsGeldg 73411 non-null object 62 RijbewijsCategry 73428 non-null object 63 RibewijsBeginnr 130 non-null object 64 BromfietsCertificate 1033 non-null object 65 VastgelegdePlaats1 112921 non-null object 66 VastgelegdePlaats2 0 non-null float64 67 VastgelegdePlaats_An 1874 non-null object 68 VoorgenomenBeweging 92659 non-null object 69 AangrijppuntType 88843 non-null object 70 Aangrijppunt1 86822 non-null object 71 Aangrijppunt2 0 non-null float64 72 Beweging1 83438 non-null object 73 Beweging2 530 non-null object 74 Beweging_An 50 non-null object 75 Inrichting 68557 non-null object 76 EersteToelatingDatum 70389 non-null float64 77 MassaLeegVoertuig 69716 non-null float64 78 Breedte 11809 non-null float64 79 Lengte 2830 non-null float64 80 APKGekeurd 55192 non-null object 81 Wegvak_BeginDat 67597 non-null float64 82 Wegvak_EndDat 959 non-null float64 83 WegBeheerder 67597 non-null object 84 WegNummer 23984 non-null object 85 WegdeelLetter 23984 non-null object 86 HectoLetter 2087 non-null object 87 Baansubsoort 24976 non-null object 88 RelatievePositie 24512 non-null object 89 WegvakRichting 17097 non-null object 90 StraatType 67597 non-null object 91 StraatNaam 67597 non-null object 92 WoonplaatsNaam 67597 non-null object 93 LinkerkantHuisNummer 33371 non-null object 94 RechterKantHuisNummer 33360 non-null object 95 EerstHuisNummerLinks 23687 non-null float64 96 EerstHuisNummerRechts 23811 non-null float64 97 LaatsHuisNummerLinks 23687 non-null float64 98 LaatsHuisNummerRechts 23811 non-null float64 99 KlokAanduingAlsBegin 67597 non-null float64 100 KlokAanduingAlsEnd 67597 non-null float64 101 SamengesteldLocatieType_Wegvak 8192 non-null object 102 SamengesteldLocatieNummer_Wegvak 8192 non-null float64 103 BeginpuntAfstand 23955 non-null float64 104 EndpuntAfstand 23955 non-null float64 105 BegKM 23955 non-null float64 106 EndKM 23955 non-null float64 107 WegbeheerderSortRijk 54454 non-null object 108 WegbeheerderSortProvincie 54454 non-null object 109 WegbeheerderSortGemeente 54454 non-null object 110 WegbeheerderSortWaterSchap 54454 non-null object 111 WegbeheerderSortOverig 54454 non-null object 112 AantalWegvakkenMetJunctie 54454 non-null float64 113 ActueellWegvakkenMetJunctie 54454 non-null object 114 SamengesteldLocatieType_Junctie 20703 non-null object 115 SamengesteldLocatieNummer_Junctie 20703 non-null float64 dtypes: float64(32), int64(5), object(79) memory usage: 108.0+ MB
data = [df3_ong['Afloop3'],
df3_ong['Wegsituatie'],
df3_ong['ProvincieNaam'],
df3_ong['Longitude'],
df3_ong['Latitude']]
headers = ['Resulting damage', 'RoadType', 'Province', 'Longitude', 'Latitude']
df3_eng = pd.concat(data, axis=1, keys=headers)
df3_eng[['Resulting damage', 'RoadType', 'Province', 'Longitude', 'Latitude']] = df3_eng[['Resulting damage', 'RoadType', 'Province', 'Longitude', 'Latitude']].replace(np.nan, 'Unknown')
df3_eng.head()
| Resulting damage | RoadType | Province | Longitude | Latitude | |
|---|---|---|---|---|---|
| 0 | Letsel | Rechte weg | Zeeland | 3.377269 | 51.303095 |
| 1 | Letsel | Rechte weg | Zuid-Holland | 4.600274 | 51.868538 |
| 2 | Uitsluitend materiele schade | Rechte weg | Overijssel | 6.927726 | 52.315446 |
| 3 | Uitsluitend materiele schade | Rechte weg | Zeeland | 3.376508 | 51.357399 |
| 4 | Uitsluitend materiele schade | Bocht | Zeeland | 3.376508 | 51.357399 |
df3_eng[df3_eng['Resulting damage'].isin(['Dodelijk'])]
| Resulting damage | RoadType | Province | Longitude | Latitude | |
|---|---|---|---|---|---|
| 20 | Dodelijk | Rechte weg | Zeeland | 3.398530 | 51.354252 |
| 87 | Dodelijk | Rechte weg | Zeeland | 3.545867 | 51.370306 |
| 393 | Dodelijk | Rechte weg, gescheiden rijbanen | Zeeland | 3.691119 | 51.502981 |
| 402 | Dodelijk | Rechte weg, gescheiden rijbanen | Zeeland | 3.688098 | 51.477851 |
| 404 | Dodelijk | Rechte weg | Zeeland | 3.682664 | 51.560152 |
| ... | ... | ... | ... | ... | ... |
| 121740 | Dodelijk | Kruispunt, 3 takken | Zeeland | 3.618046 | 51.495475 |
| 121847 | Dodelijk | Kruispunt, 3 takken | Zeeland | 3.879723 | 51.512146 |
| 121933 | Dodelijk | Kruispunt, 4 takken | Zeeland | 3.932732 | 51.480958 |
| 122029 | Dodelijk | Rotonde | Zeeland | 3.876649 | 51.492496 |
| 122032 | Dodelijk | Kruispunt, 4 takken | Zeeland | 3.893548 | 51.510166 |
678 rows × 5 columns
import plotly.graph_objects as go
from plotly.subplots import make_subplots
#make subplots
fig = make_subplots(rows=1, cols=2, specs=[[{'type':'pie'}, {'type':'bar'}]])
#make trace
fig.add_trace(go.Pie(labels=df3_eng['Resulting damage'],
pull=[0.05, 0, 0.1],
textinfo='label+percent',
marker_colors=['orange', 'red', 'green']
),1,1)
fig.add_trace(go.Bar(x=['Uitsluitend materiele schade','Letsel', 'Dodelijk'],
y=df3_eng['Resulting damage'].value_counts(),
marker_color=['green', 'orange', 'red']
),1,2)
fig.update_layout(
title_text='Distribution of death, injury and damage in traffic accidents the Netherlands 2006')
fig.show()
fig = px.histogram(df3_eng, x='RoadType',
color='Resulting damage',
color_discrete_map = {'Uitsluitend materiele schade':'green','Letsel':'orange','Dodelijk':'red'},
labels={
},
title='Traffic accident related death, injury and damage rates per road type the Netherlands 2006')
fig.update_layout(barmode='group',xaxis={'categoryorder':'total descending'},)
fig.show()
deadly06 = df3_eng[df3_eng['Resulting damage'].isin(['Dodelijk'])]
deadly06
| Resulting damage | RoadType | Province | Longitude | Latitude | |
|---|---|---|---|---|---|
| 20 | Dodelijk | Rechte weg | Zeeland | 3.398530 | 51.354252 |
| 87 | Dodelijk | Rechte weg | Zeeland | 3.545867 | 51.370306 |
| 393 | Dodelijk | Rechte weg, gescheiden rijbanen | Zeeland | 3.691119 | 51.502981 |
| 402 | Dodelijk | Rechte weg, gescheiden rijbanen | Zeeland | 3.688098 | 51.477851 |
| 404 | Dodelijk | Rechte weg | Zeeland | 3.682664 | 51.560152 |
| ... | ... | ... | ... | ... | ... |
| 121740 | Dodelijk | Kruispunt, 3 takken | Zeeland | 3.618046 | 51.495475 |
| 121847 | Dodelijk | Kruispunt, 3 takken | Zeeland | 3.879723 | 51.512146 |
| 121933 | Dodelijk | Kruispunt, 4 takken | Zeeland | 3.932732 | 51.480958 |
| 122029 | Dodelijk | Rotonde | Zeeland | 3.876649 | 51.492496 |
| 122032 | Dodelijk | Kruispunt, 4 takken | Zeeland | 3.893548 | 51.510166 |
678 rows × 5 columns
fig = px.histogram(deadly06, x='RoadType', color='Resulting damage',
color_discrete_map = {'Uitsluitend materiele schade':'green','Letsel':'orange','Dodelijk':'red'},
title='Total deaths per traffic incident by type of road the Netherlands 2006')
fig.update_layout(barmode='group',xaxis={'categoryorder':'total descending'},)
fig.show()
fig = px.histogram(deadly06, x='Province', color='RoadType',
color_discrete_map = {'Rechte weg': '#003f5c',
'Bocht':'#2f4b7c',
'Rechte weg, gescheiden rijbanen':'#665191',
'Rotonde':'#a05195',
'Uitvoegstrook op (auto)snelweg':'#d45087',
'Invoegstrook op (auto)snelweg':'#f95d6a',
'Kruispunt, 3 takken':'#ff7c43',
'Kruispunt, 4 takken':'#ffa600',},
title='Total deaths per traffic incident by type of road the Netherlands 2006')
fig.update_layout(barmode='group',xaxis={'categoryorder':'total descending'},)
fig.show()
import folium
a = deadly06[['Latitude', 'Longitude', 'RoadType']]
m = folium.Map(location=[52, 5.398530], zoom_start=7, tiles='cartodbpositron')
type_color = {'Rechte weg': '#003f5c',
'Bocht':'#2f4b7c',
'Rechte weg, gescheiden rijbanen':'#665191',
'Rotonde':'#a05195',
'Uitvoegstrook op (auto)snelweg':'#d45087',
'Invoegstrook op (auto)snelweg':'#f95d6a',
'Kruispunt, 3 takken':'#ff7c43',
'Kruispunt, 4 takken':'#ffa600',}
for index, row in a.iterrows():
folium.CircleMarker(location=[row.Latitude, row.Longitude],
radius=2,
color=type_color[row.RoadType],
fill=True,
fill_color="#3186cc",
).add_to(m)
#m
data_v = [df3_ong['Afloop3'],
df3_ong['Inrichting'],
df3_ong['EersteToelatingDatum'],
df3_ong['MassaLeegVoertuig'],
df3_ong['Breedte'],
df3_ong['Lengte'],
df3_ong['APKGekeurd'],
df3_ong['ObjectType'],]
headers_v = ['RD', 'VType', 'VImp', 'VM', 'VW', 'VL', 'APK', 'OBJ']
vehicle06 = pd.concat(data_v, axis=1, keys=headers_v)
vehicle06['VImp'] = pd.to_datetime(vehicle06['VImp'], format='%Y%m%d')
vehicle06.head()
| RD | VType | VImp | VM | VW | VL | APK | OBJ | |
|---|---|---|---|---|---|---|---|---|
| 0 | Letsel | gesloten wagen | 2001-10-04 | 2245.0 | 193.0 | NaN | J | Bestelauto |
| 1 | Letsel | sedan met achterklep | 2005-11-30 | 935.0 | NaN | NaN | NaN | Personenauto |
| 2 | Uitsluitend materiele schade | NaN | 2000-04-18 | 188.0 | NaN | NaN | NaN | Motor |
| 3 | Uitsluitend materiele schade | NaN | NaT | NaN | NaN | NaN | NaN | Personenauto |
| 4 | Uitsluitend materiele schade | sedan | 1996-01-05 | 1395.0 | NaN | NaN | J | Personenauto |
vehicle06['OBJ'] = vehicle06['OBJ'].replace(np.nan, 'Unk')
vehicle06['VType'] = vehicle06['VType'].replace(np.nan, 'Unk')
fig = px.histogram(vehicle06, x='VType', color='OBJ',
title='Vehicle type involved in accidents by object type in the Netherlands 2006')
fig.update_layout(xaxis={'categoryorder':'total descending'},)
fig.show()
data4 = [df3_ong['Afloop3'],
df3_ong['Geslacht_BstuurdrOfVoetgngr'],
df3_ong['RibewijsGeldg'],
df3_ong['RijbewijsCategry'],
df3_ong['RibewijsBeginnr'],
df3_ong['BromfietsCertificate'],
df3_ong['AantalPassagiers'],
df3_ong['Doorrijder']]
headers4 = ['RD', 'G', 'LValid', 'LCat', 'LBgnr', 'LScoot', 'NPass', 'Run']
pass04 = pd.concat(data4, axis=1, keys=headers4)
pass04[['RD', 'LValid', 'LCat', 'G', 'LBgnr', 'Run']] = pass04[['RD', 'LValid', 'LCat', 'G', 'LBgnr', 'Run']].replace(np.nan, 'Unknown')
pass04.head()
| RD | G | LValid | LCat | LBgnr | LScoot | NPass | Run | |
|---|---|---|---|---|---|---|---|---|
| 0 | Letsel | M | J | B | Unknown | NaN | NaN | Unknown |
| 1 | Letsel | V | J | B,E | Unknown | NaN | NaN | Unknown |
| 2 | Uitsluitend materiele schade | Unknown | J | A1 | Unknown | NaN | NaN | Unknown |
| 3 | Uitsluitend materiele schade | Unknown | Unknown | Unknown | Unknown | NaN | NaN | J |
| 4 | Uitsluitend materiele schade | Unknown | J | B | Unknown | NaN | NaN | Unknown |
fig = px.histogram(pass04, x='LCat', color='G',
title='License type per gender involved in accidents in the Netherlands 2006',
labels={'G':'Gender'})
fig.update_layout(barmode='group',xaxis={'categoryorder':'total descending'},)
fig.show()
fig = px.histogram(pass04, x='RD',
color='Run',
title='Hit and run rate in accidents per damage type in the Netherlands 2006',
)
fig.update_layout(barmode='group',xaxis={'categoryorder':'total descending'},)
fig.show()
data5 = [df3_ong['Afloop3'],
df3_ong['BebouwdeKom'],
df3_ong['AangegevenMaxSnelheid'],
df3_ong['Wegverlichting'],
df3_ong['Wegverharding'],
df3_ong['Lichtgesteldheid'],
df3_ong['Zichtafstand'],
df3_ong['Weergesteldheid1'],]
headers5 = ['RD', 'CityZone', 'MaxSpd', 'StrtLight', 'StrtMat', 'NatLight', 'Vis', 'Weath']
lim06 = pd.concat(data5, axis=1, keys=headers5)
lim06[['RD', 'CityZone', 'MaxSpd', 'StrtLight', 'StrtMat', 'NatLight', 'Vis', 'Weath']] = lim06[['RD', 'CityZone', 'MaxSpd', 'StrtLight', 'StrtMat', 'NatLight', 'Vis', 'Weath']].replace(np.nan, 'Unknown')
lim06.head()
| RD | CityZone | MaxSpd | StrtLight | StrtMat | NatLight | Vis | Weath | |
|---|---|---|---|---|---|---|---|---|
| 0 | Letsel | Buiten | 60.0 | Niet aanwezig | Overig asfalt | Daglicht | Unknown | Droog |
| 1 | Letsel | Binnen | 50.0 | Brandend | Klinkers | Daglicht | Unknown | Regen |
| 2 | Uitsluitend materiele schade | Binnen | 50.0 | Niet brandend | Overig asfalt | Daglicht | Unknown | Droog |
| 3 | Uitsluitend materiele schade | Buiten | 60.0 | Niet brandend | Overig asfalt | Daglicht | Unknown | Droog |
| 4 | Uitsluitend materiele schade | Buiten | 60.0 | Niet aanwezig | Overig asfalt | Duisternis | Unknown | Droog |
fig = px.histogram(lim06, x='MaxSpd',
color='RD',
color_discrete_map = {'Uitsluitend materiele schade':'green','Letsel':'orange','Dodelijk':'red'},
title='Location per damage type in the Netherlands 2006',
)
fig.update_layout(barmode='group',xaxis={'categoryorder':'total descending'},)
fig.show()
data2 = [df3_ong['Afloop3'],
df3_ong['ObjectType'],
df3_ong['ObjectType_Anders'],
df3_ong['VoertuigVerzekerd'],
df3_ong['Schade'],
df3_ong['AanhangenWagen'],
df3_ong['VoertuigVerlichting'],
df3_ong['GevaarlijkStoffenPlaat']]
headers2 = ['RD', 'Type', 'TypeOther', 'Ens', 'DMG', 'TRL', 'LGHT', 'DNGSub']
df3_vehicle = pd.concat(data2, axis=1, keys=headers2)
df3_vehicle.head()
| RD | Type | TypeOther | Ens | DMG | TRL | LGHT | DNGSub | |
|---|---|---|---|---|---|---|---|---|
| 0 | Letsel | Bestelauto | NaN | J | J | J | Onbekend | NaN |
| 1 | Letsel | Personenauto | NaN | J | J | NaN | Onbekend | NaN |
| 2 | Uitsluitend materiele schade | Motor | NaN | J | J | NaN | Onbekend | NaN |
| 3 | Uitsluitend materiele schade | Personenauto | NaN | NaN | J | NaN | Onbekend | NaN |
| 4 | Uitsluitend materiele schade | Personenauto | NaN | J | J | NaN | Onbekend | NaN |
df3_vehicle['Type'].value_counts()
Personenauto 63555 Bestelauto 8528 Overig wegmeubilair 8273 Onbekend voertuig i.g.v. doorrijder 6467 Fiets 5556 Overig vast object 5113 Lichtmast 3711 Bromfiets 3242 Boom 2920 Vrachtauto 2651 Dier 2560 Motor 1773 Trekker 1379 Bus 1026 Los voorwerp 1009 Snorfiets 905 Voetganger 828 Trekker met oplegger 648 Landbouwvoertuig 513 Brommobiel 333 Trein/tram 208 Name: Type, dtype: int64
fig = px.histogram(df3_vehicle, x='Type',
title='Object type involved in accidents in the Netherlands 2006')
fig.update_layout(barmode='group',xaxis={'categoryorder':'total descending'},)
fig.show()
deadly_veh06 = df3_vehicle[df3_vehicle['RD'].isin(['Dodelijk'])]
fig = px.histogram(deadly_veh06, x='Type',
color='RD',
color_discrete_map = {'Uitsluitend materiele schade':'green','Letsel':'orange','Dodelijk':'red'},
title='Deaths per object type involved in accidents in the Netherlands 2006')
fig.update_layout(barmode='group',xaxis={'categoryorder':'total descending'},)
fig.show()
df3_vehicle['Ens'] = df3_vehicle['Ens'].replace(np.nan, 'Unk')
fig = px.histogram(df3_vehicle, x='Type',
color='Ens',
color_discrete_map = {'J':'green','Unk':'orange','N':'red'},
title='Insurance rate per object type involved in accidents in the Netherlands 2006')
fig.update_layout(barmode='group',xaxis={'categoryorder':'total descending'},)
#fig.show()
verkeers doden https://www.cbs.nl/nl-nl/nieuws/2019/16/11-procent-meer-verkeersdoden-in-2018
Doden 2006-2012 https://ckan.dataplatform.nl/dataset/dodelijke-verkeersongevallen/resource/ff887c9b-c5fc-4e88-96d9-b435b170d4b6
vehicle by road type https://ec.europa.eu/eurostat/databrowser/view/SDG_11_40__custom_1650876/default/table?lang=en
deaths in eu by type of vehicle https://ec.europa.eu/eurostat/databrowser/view/tran_sf_roadve/default/table?lang=en
victims by road accidents https://ec.europa.eu/eurostat/databrowser/view/TRAN_R_ACCI__custom_1650879/default/table?lang=en